139 research outputs found

    Youla-Kucera parameterized adaptive tracking control for optical data storage systems

    Get PDF
    In the next generation optical data storage systems, the tolerance of the tracking error will become even smaller under various unknown working situations. However, the unknown external disturbances caused by vibrations make it difficult to maintain the desired tracking precision during normal disk operation. It is proposed in this paper to use an adaptive regulation approach to maintain the tracking error below its desired value despite these unknown disturbances. The design of the regulator is formulated by augmenting a base controller into a Youla-Kucera (Q) parameterized set of stabilizing controllers so that both the deterministic and the random disturbances can be deal with properly. The adaptive algorithm is developed to search the desired Q parameter which satisfies the Internal Model Principle and thus the exact regulation against the unknown deterministic disturbance can be achieved. The performance of the proposed control approach is evaluated with experimental results that illustrate the capability of the proposed adaptive regulator to attenuate the unknown disturbances and achieve the desired tracking precision

    Voice conversion versus speaker verification: an overview

    Get PDF
    A speaker verification system automatically accepts or rejects a claimed identity of a speaker based on a speech sample. Recently, a major progress was made in speaker verification which leads to mass market adoption, such as in smartphone and in online commerce for user authentication. A major concern when deploying speaker verification technology is whether a system is robust against spoofing attacks. Speaker verification studies provided us a good insight into speaker characterization, which has contributed to the progress of voice conversion technology. Unfortunately, voice conversion has become one of the most easily accessible techniques to carry out spoofing attacks; therefore, presents a threat to speaker verification systems. In this paper, we will briefly introduce the fundamentals of voice conversion and speaker verification technologies. We then give an overview of recent spoofing attack studies under different conditions with a focus on voice conversion spoofing attack. We will also discuss anti-spoofing attack measures for speaker verification.Published versio

    PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network

    Full text link
    It is common in everyday spoken communication that we look at the turning head of a talker to listen to his/her voice. Humans see the talker to listen better, so do machines. However, previous studies on audio-visual speaker extraction have not effectively handled the varying talking face. This paper studies how to take full advantage of the varying talking face. We propose a Pose-Invariant Audio-Visual Speaker Extraction Network (PIAVE) that incorporates an additional pose-invariant view to improve audio-visual speaker extraction. Specifically, we generate the pose-invariant view from each original pose orientation, which enables the model to receive a consistent frontal view of the talker regardless of his/her head pose, therefore, forming a multi-view visual input for the speaker. Experiments on the multi-view MEAD and in-the-wild LRS3 dataset demonstrate that PIAVE outperforms the state-of-the-art and is more robust to pose variations.Comment: Interspeech 202

    Investigating gated recurrent neural networks for speech synthesis

    Get PDF
    Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS). The long short-term memory (LSTM) architecture is particularly attractive because it addresses the vanishing gradient problem in standard RNNs, making them easier to train. Although recent studies have demonstrated that LSTMs can achieve significantly better performance on SPSS than deep feed-forward neural networks, little is known about why. Here we attempt to answer two questions: a) why do LSTMs work well as a sequence model for SPSS; b) which component (e.g., input gate, output gate, forget gate) is most important. We present a visual analysis alongside a series of experiments, resulting in a proposal for a simplified architecture. The simplified architecture has significantly fewer parameters than an LSTM, thus reducing generation complexity considerably without degrading quality.Comment: Accepted by ICASSP 201

    Merlin: An Open Source Neural Network Speech Synthesis System

    Get PDF
    • …
    corecore